-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gemini: lazy generation of partition keys #200
Conversation
6198af0
to
29eb668
Compare
ce5f50d
to
2c1bb39
Compare
source.go
Outdated
} | ||
|
||
// giveOld returns the supplied value for later reuse unless the value | ||
//is empty in which case it removes the corresponding token from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space needed after the comment marker.
oneStdDev = 0.341 | ||
*/ | ||
stdDistMean = math.MaxUint64 / 2 | ||
oneStdDev = 0.341 * math.MaxUint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks unrelated, and let's not add commented out code to the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably skip it although I think there is some other code that depends on it. Yes I only left the code there while coding as an easy reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we introduced the TokenIndex
type, we can keep this change. But please remove the commented out constants.
cmd/gemini/root.go
Outdated
@@ -418,8 +428,8 @@ func init() { | |||
rootCmd.Flags().IntVarP(&maxRetriesMutate, "max-mutation-retries", "", 2, "Maximum number of attempts to apply a mutation") | |||
rootCmd.Flags().DurationVarP(&maxRetriesMutateSleep, "max-mutation-retries-backoff", "", 10*time.Millisecond, "Duration between attempts to apply a mutation for example 10ms or 1s") | |||
rootCmd.Flags().Uint64VarP(&pkBufferReuseSize, "partition-key-buffer-reuse-size", "", 2000, "Number of reused buffered partition keys") | |||
rootCmd.Flags().Uint64VarP(&distributionSize, "distribution-size", "", 1000000, "Number of partition keys each worker creates") | |||
rootCmd.Flags().StringVarP(&partitionKeyDistribution, "partition-key-distribution", "", "uniform", "Specify the distribution from which to draw partition keys, supported values are currently uniform|normal|exponential") | |||
rootCmd.Flags().Uint64VarP(&distributionSize, "distribution-size", "", math.MaxUint64, "Number of partition keys each worker creates") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check it.
schema.go
Outdated
@@ -756,6 +758,9 @@ func (s *Schema) genSinglePartitionQuery(t *Table, source *Source, r *rand.Rand, | |||
if !ok { | |||
return nil | |||
} | |||
if len(values.Value) != len(t.PartitionKeys) { | |||
fmt.Printf("values=%d, pk=%d\n", len(values.Value), len(t.PartitionKeys)) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like left-over debugging printout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
I would like to get rid of the lock sharding commit from this pull request, and defer the optimization. I think a lockless set is probably a cleaner approach, but I would like lazy generation to be merged first. |
The contract that the gemini.DistributionFunc adheres to has thus far been underspecified. This commits attempts to formalize what it is expected to return and why.
cmd/gemini/jobs.go
Outdated
for j, table := range schema.Tables { | ||
g := generators[j] | ||
delta := math.MaxUint64 % actors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent of delta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be removed. It is left from an intermediate state when I thought the partitions mattered here as well.
Hand wiring goroutine termination was messy. Using the package gopkg.in/tomb.v2 makes it much simpler, safer and nicer. This commit fixes a deadlock that happened sometimes diring heavy operations and external shutdown.
f27170f
to
583d711
Compare
d541e7d
to
1fc670c
Compare
Lazy partition key generation reintroduced to avoid out of memory issues.
Fixes: #200
Fixes: #199